Vector and Cache Performance of Occam

نویسنده

R. W. FORD

چکیده

This paper compares the performance of a recent version of the OCCAM Ocean Model and a more vector oriented version on two architectures, a parallel cache based SMP (an SGI Challenge) and a parallel vector based machine (a Fujitsu VPP 700). OCCAM was originally developed to run eeciently on cache based parallel platforms, with little attention paid to its vector performance. However, with the continuing price/performance levels of the latest parallel vector architectures and their continued use in meteorology, an eecient vector version of OCCAM may prove useful. Given the potential for blocking vector codes to improve cache performance and the recent implementation of streaming to improve the vector performance of caches, it may be possible to obtain a single implementation of OCCAM, which is eecient on both types of architecture. Preliminary results show the potential for such an approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Characteristics of an On-Chip Cache on NEC SX Vector Architecture

Thanks to the highly effective memory bandwidth of the vector systems, they can achieve the high computation efficiency for computation-intensive scientific applications. However, they have been encountering the memory wall problem and the effective memory bandwidth rate has decreased, resulting in the decrease in the bytes per flop rates of recent vector systems from 4 (SX-7 and SX-8) to 2 (SX...

متن کامل

Data Cache Performance When Vector-Like Accesses Bypass the Cache

A Stream Memory Controller, when added to a conventional memory hierarchy, routes vector-like accesses around the data cache. A memory system was simulated under these conditions and the data cache performance increased dramatically. The gain in performance was a result of the increased temporal locality of the access pattern. The access pattern also showed a decrease in spatial locality, makin...

متن کامل

A Vector C and Fortran Compiler for the FPS T-Series: Experiences with Compiling to occam I

We describe our implementation of C and Fortran preprocessors for the FPS T-series hypercube. The target of these preprocessors is the occam I language. We provide a brief overview of the INMOS transputer and the Weitek vector processing unit (VPU). These two units comprise one node of the T-series. Some depth of understanding of the VPU is required to fully appreciate the problems encountered ...

متن کامل

Performance Modeling and Analysis of Cache Blocking in Sparse Matrix Vector Multiply

We consider the problem of building high-performance implementations of sparse matrix-vector multiply (SpM×V), or y = y+A ·x, which is an important and ubiquitous computational kernel. Prior work indicates that cache blocking of SpM×V is extremely important for some matrix and machine combinations, with speedups as high as 3x. In this paper we present a new, more compact data structure for cach...

متن کامل